Transcription dosage compensation does not occur in Down syndrome

Background The increase in DNA copy number in Down syndrome (DS; caused by trisomy 21) has led to the DNA dosage hypothesis, which posits that the level of gene expression is proportional to the gene’s DNA copy number. Yet many reports have suggested that a proportion of chromosome 21 genes are dosage compensated back towards typical expression levels (1.0×). In contrast, other reports suggest that dosage compensation is not a common mechanism of gene regulation in trisomy 21, providing support to the DNA dosage hypothesis. Results In our work, we use both simulated and real data to dissect the elements of differential expression analysis that can lead to the appearance of dosage compensation, even when compensation is demonstrably absent. Using lymphoblastoid cell lines derived from a family with an individual with Down syndrome, we demonstrate that dosage compensation is nearly absent at both nascent transcription (GRO-seq) and steady-state RNA (RNA-seq) levels. Furthermore, we link the limited apparent dosage compensation to expected allelic variation in transcription levels. Conclusions Transcription dosage compensation does not occur in Down syndrome. Simulated data containing no dosage compensation can appear to have dosage compensation when analyzed via standard methods. Moreover, some chromosome 21 genes that appear to be dosage compensated are consistent with allele specific expression. Supplementary information The online version contains supplementary material available at 10.1186/s12915-023-01700-4.


Fig S 1 :
Fig S 1: QC metrics of RNA-seq and GRO-seq datasets.(Top) RSeQC read distribution plots of all datasets, showing relative abundance of reads at each listed genomic feature.(Bottom) FastQC plots showing number of total reads, and proportion of duplicate reads for each dataset.

Fig S 2 :
Fig S 2: Sample Distance between GRO-seq and RNA-seq datasets Euclidean distance between (Left) RNA-seq datasets (Right) GRO-seq datasets, including replicates.The letters A, B, C refer to biological replicates.For more information on samples, see the methods section.

Fig S 3 :
Fig S 3: Cumulative distribution plot of fold changes of each chromosome.All diploid chromosomes in blue; chromosome 21 in red.Dosage compensated genes are often identified using a fold change cutoff, indicated by each vertical dotted line (red:1.5xfold-change, blue: 1,0x foldchange); however, the reduced fold change estimations of these genes can also be explained by biological or technical variance.Left: GRO-seq.Right: RNA-seq.

Fig S 6 :
Fig S 6: Volcano Plots of GRO-seq results.Chromosome 21 genes are indicated in blue.LFC = 0.58 indicated by the blue vertical dotted line.Padj = 0.01 indicated by the red horizontal dotted line.Counts are from raw mapped reads (includes multi-mapped reads).

Fig S 7 :
Fig S 7: Volcano Plots of RNA-seq results.Chromosome 21 genes are indicated in blue.LFC = 0.58 indicated by the blue vertical dotted line.Padj = 0.01 indicated by the red horizontal dotted line.Counts are from raw mapped reads (includes multi-mapped reads).

Fig S 8 :Fig S 9 :
Fig S 8: Fold change estimations (T21 vs D21) across expression levels in GRO-seq (top) and RNA-seq (bottom).Genes are grouped by expression quantiles.Lower quantiles, on the left, contain the genes with the lowest expression (and therefore the genes with the highest technical variability in measurements).Median fold changes for each group indicated with an orange line.The lowest expression quantile has a noticeably lower fold change estimate for chromosome 21 genes.On the far right (labeled 1-100), the figure shows all genes in one violin plot regardless of expression levels.

Fig S 13 :
Fig S 13: Fold-change distributions of simulated T21 and D21 data sets with high dispersion, varying the depth and replication number of the samples.Simulations used high dispersion estimates (asymptotic dispersion=.08,extra-Poisson noise=8) and depth was changed relative to our D21 RNA-seq data.Median fold changes indicated with orange lines.(see also Supplemental Fig 1).Left: low depth (1x) and low replication (n=3).Right: high depth (3x) and high replication (n=12).

Fig S 14 :
Fig S 14: Power analysis of a "dosage-compensated" gene using dispersion parameters of normalized trisomic data.The proportion of unchanging (non-compensated) genes was set at 0.9.Dispersion was estimated as in DESeq2 (a + b /µ, estimated here at a = 0.03 and b = 5).(Left) Power analysis with different mean expression levels, denoted by µ, with an effect size of 0.5 (a fold change of 1.5, or a Log2 fold-change of +/-0.58).(Right) Power analysis with differing effect sizes, with a mean expression level µ= 200.Power graphs generated from the R library ssizeRNA (v1.3.2) [1].

Fig S 16 :
Fig S 16: Differential analysis with adjusted alternative hypothesis (|LFC| < log2(1.5)).Significant genes (red) are those which are significantly below the expected value of 1.5 (dotted blue line).Median fold change for each plot indicated with an orange line.Left: GRO-seq, 56 significant genes.Right: RNA-seq, 20 significant genes

Fig S 18 :
Fig S 18: Fold-change estimates of simulated T21 and D21 data sets, using low and high dispersion estimates.Fold-change estimates were adjusted to account for trisomy, as in Fig 2G.Low dispersion: a=.01, b=1.High dispersion: a=.05, b=30.Median fold changes indicated by orange lines.

Fig S 19 :
Fig S 19: Cumulative distribution plots of fold changes in trisomy aware RNA-seq and GRO-seq analyses.dotted red line indicates a log2 fold change of 0. Vertical dotted blue line indicates a log2 Fold change of -log2(1.5).(A) CDF of RNA-seq fold changes on all chromosomes (blue solid lines) with chromosome 21 indicated in red.(B) Same as (A), but using GRO-seq data.(C)CDF of GRO-seq fold changes using chromosome 21 genes (red) and a random subsample of an equivalent number of non-chromosome 21 genes (blue) which normalizes for size of chromosome.

Fig S 20 :
Fig S 20: Violin plots depicting fold change estimates for chromosome 20 and 21 genes between T21 and D21 GRO-seq datasets.(Left) Default analysis not accounting for the ploidy of the samples.(Right) Adjusted analysis correcting for ploidy differences between the samples.Median fold changes indicated by orange lines.

Fig S 21 :
Fig S 21: UpSet plots indicating overlap of significant gene calls on chromosome 21 comparing two disomic individuals.We compare the D21 brother to the father and identify chromosome 21 encoded statistically significant genes (padj<.01).Three cases shown: we normalize to the entire family (no samples removed), remove the T21 sample or the mother sample is removed (as a control).

Fig S 22 :
Fig S 22: Sankey diagram indicating explanations of fold changes for chromosome 21 genes in GRO-seq (for RNA-seq, see Fig. 3D.Nearly all genes with fold change lower than 1.5 can be explained by technical factors, leaving only 8 genes whose expression is below expectation that are not accounted for by genetic variation.